Skip to content

Comments

Round-trip testing POC#180

Draft
Nikil-Shyamsunder wants to merge 5 commits intomainfrom
round-trip-testing
Draft

Round-trip testing POC#180
Nikil-Shyamsunder wants to merge 5 commits intomainfrom
round-trip-testing

Conversation

@Nikil-Shyamsunder
Copy link
Collaborator

@Nikil-Shyamsunder Nikil-Shyamsunder commented Feb 13, 2026

This isn't integrated with snapshot testing or CI yet. The obvious ways I could think of to integrate this with Turnt seemed clunky or required more complex changes, so for now I handrolled much of the logic to collect the tests and parse the arguments (which makes it kinda janky). I don't expect this stuff to go into main branch via this PR, but if thisis valuable we can figure out how to do that properly later.

After implementing this, I found a few potential "bugs" in the interpreter+monitor.

How the round-trip test works:

For each .tx file where the interpreter succeeds:

  1. Run the interpreter to generate an FST waveform
  2. Run the monitor on that FST with the same .prot file
  3. Check if the monitor succeeds
  4. TODO: actually compare against the monitors output with the interpreters .tx. We could do this now, but there are slight differences in the formatting that maybe we want to consider changing first?

The script is at scripts/roundtrip.py and the generated output is in scripts/rountrip.out

Current output:

=== Round-trip results ===
  Passed:  28 / 33
  Failed:  5 / 33
  Skipped: 37 (transactions don't complete successfully)

Monitor failures (all are expected!):

  --- protocols/tests/adders/adder_d1/busy_wait_pass.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/adders/adder_d1/loop_with_assigns.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/adders/adder_d1/nested_busy_wait.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/fifo/push_pop_loop_empty.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/fifo/push_pop_loop_not_empty.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

Monitor bugs found and fixed (kinda?)

1. FST files with no time entries crash fst-reader (combinational-only designs)

Affected tests: add_combinational.tx, passthrough_combdep.tx

Root cause: Designs like add_d0 are purely combinational. When the interpreter runs a single cycle on them, the generated FST has no time entries. The fst-reader crate then panics at time_chain[0] (index out of bounds on an empty vec).

Quick Fix: Added an extra sim_step() at the end of execute_todos() in protocols/src/scheduler.rs so the FST always has at least one time entry. This empty cycle, from what I can tell, doesn't break anything for the monitor?

2. Monitor panics when a protocol argument is never mapped to a pin

Affected tests: add_combinational.tx (protocol add_combinational_illegal_observation_in_conditional has in b: u32 but does DUT.b := X, so b is never mapped to a trace value).

There are a few things going on with this test. One is that I believe that the add_combinational_illegal_observation_in_conditional is actually being picked up by the monitor as a valid trace, despite it being illegal from the perspective of the interpreter. If it was noted to be illegal, we wouldnt get the following downstream error:

Root cause: to_protocol_application in monitor/src/interpreter.rs did unwrap_or_else(|| panic!(...)) when looking up an argument in args_mapping.

Quick Fix: missing args are serialized as "?" instead of panicking. This could also be serialized as "X"? A protocol might have an argument it never uses, and that shouldn't be an error, I think. Let me know if I am wrong. This failure in general definitely requires greater investigation. Regardless of the true upstream fix for the add_combinational test, I think it is reasonable for people to write valid protocols with an unused arg, and the monitor might deal with that more gracefully than it does now, unless unknown params cause other issues in monitor tractability..

3. Monitor kills scheduler when a finished thread has slower siblings still running

Affected tests: both_threads_pass.tx.

Root cause: validate_finished_and_failed_threads in monitor/src/scheduler.rs returned an error if a thread finished but sibling threads from the same start cycle were still in the next queue. This is wrong when protocols have different lengths (e.g., add finishes in 2 cycles but wait_and_add takes longer). Ernest's meta scheduling thing is able to handle keeping the other "slower" trace around, but the current monitor logic was erroring instead. So, I just deleted that block of code. Now, both traces become valid.

Quick Fix: Instead of returning SchedulerError::NoTransactionsMatch, move the slower sibling threads from next to failed.

4. Empty blocks in monitor cause premature exits

Affected tests: passthrough_combdep.tx.

Root cause: In monitor/src/interpreter.rs, evaluate_stmt for Stmt::Block with an empty body returned Ok(None) (signaling "thread is done"), but it should have returned Ok(self.next_stmt_map[stmt_id]) to continue to the next statement in the parent scope. An empty if branch like if (cond) { } else { } would cause the thread to terminate early.

Actual Fix: Changed empty Block handling to use next_stmt_map instead of returning Ok(None). This reflects the interpreter logic. I think this was a bug in the interpreter that was only recently discovered a few months ago and patched by me, which is probably why the monitor logic is off.

5. Protocols that don't end with step() should be discarded

Affected tests: both_threads_pass.tx (protocol add_doesnt_end_in_step ends with fork() instead of step()).

Root cause: When a thread finishes execution (Ok(None) in run_thread_till_next_step), the monitor unconditionally added it to the finished queue. Ill-formed protocols that don't end with step() would get treated as successful matches.

Quick Fix: Added a check in run_thread_till_next_step: if the last executed statement isn't Stmt::Step, the thread is moved to failed instead of finished.

A weird side effect of this was that this rule also affects the stall protocol in the AXI stream tests. The stall protocol has assertions after its step():

prot stall<DUT: AXISManager>(out data: u32, out last: u1) {
    ...
    step();
    assert_eq(DUT.i_tdata, data);   // post-step assertion
    assert_eq(DUT.i_tlast, last);   // post-step assertion
}

I am sure these tests were written as such for a reason, but I'm confused as to why they yield valid results if the Protocols are not well-formed.

@Nikil-Shyamsunder Nikil-Shyamsunder changed the title Round-trip testing Round-trip testing POC Feb 13, 2026
"Thread {} (`{}`) finished but there are other threads with the same start time ({}) in the `next` queue, namely {:?}",
// ...any other threads from the same start cycle still in `next`
// are slower siblings that lost the race — move them to `failed`
let sibling_names: Vec<String> = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #174 we now want to allow for different matching traces. Instead of cutting them of, we report all of them, which is why you can see things like Trace 0 and Trace 1 in the output.
We probably need to think of a better way to integrate multiple traces in the output format so that it can remain compatible with the interpreter. The easiest would probably be to define a keyword that signals the start of a new trace, so in the interpreter, you could then reset everything and execute the new trace.

Copy link
Collaborator Author

@Nikil-Shyamsunder Nikil-Shyamsunder Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though I should point out the tests that led me to this change were failing to produce any trace in the monitor. I think this was just a small thing that wasn't removed in #174. I've removed my logic as well as the old monitor logic and tested; I checked that the monitor is able to produce multiple traces now for this test.

stall(7, 0) // [time: 1012.5ns -> 1037.5ns] (thread 46)
reset() // [time: 1062.5ns -> 1087.5ns] (thread 48)
reset() // [time: 1087.5ns -> 1100ns] (thread 49)
Trace 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can see how your change means that the monitor only outputs a single trace instead of three different ones. However, we do want the monitor to output 3 traces for this example.

Copy link
Collaborator Author

@Nikil-Shyamsunder Nikil-Shyamsunder Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of the change in the above comment got us back up to two traces. The problem with getting the third trace is that the stall protocol didn't end in step(), so traces with it were getting thrown out for well-formedness reasons. We need that check for interpreter-monitor parity, or we get rid of the check, update well-formedness checks, and modify the interpreter to handle these.

I'm noticing a few of the monitor protocols, particularly for stalls, dont end in fork(); step() and just end with assertions. I'm wondering if this is an intentional exception to the well-formedness checks? when I add a fork(); step(); back in, I'm losing one of the three traces from the output. The reason is that the stall is now treated as a 2-cycle protocol, so the '1 cycle stall->1 cycle stall' trace disappears and so we go from 3 possible traces to just 2...

if the intention is to relax the well-formedness checks to allow only assertions after the last step(), we can do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I told Nikil this was probably a mistake on my part when I hand-wrote the protocols since I didn't run them through the interpreter -- I will investigate this!

@@ -0,0 +1,32 @@

=== Round-trip results ===
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be committed to the repo? Or do you want to add the filename to the .gitignore.

monitor_cmd, shell=True, cwd=base_dir,
capture_output=True, text=True,
)
if result.returncode == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan on comparing the content of the .tx file that the monitor produces to the original .tx file?

@ngernest
Copy link
Contributor

As a heads-up, I created #181 to help make the round-trip tests easier, I would recommend merging #181 before this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants